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Abstract 

We define the "shift-match number" for a binary string and we compute the probability of 
occurrence of a given string as a subsequence in longer strings in terms of its shift-match number. 
We thus prove that the string matching probabilities depend not only on the length of shorter 
strings, but also on the equivalence class of the shorter string determined by its shift-match number. 

PACS 02.10.Ox, 87.10.+e 



I. INTRODUCTION 



The sequence-matching problem can be defined as deciding whether a given string over 
any alphabet occurs at least once as a subsequence in another set of strings and it is a 
problem of interest in informatics and genetics. From the informatics point of view, the 
string-matching problem is essentially the development of fast algorithms for determining 
exact or approximate occurrences of a short string in longer strings. The "words" searched 
for are context and language dependent, hence they should not be considered merely as a 
random selection from the set of n-digit sequences of the given alphabet Q]. On the other 
hand genomic interactions [2| can be modelled as a valued graph, or "network" in which 
the nodes are strings of a given length. In a previous set of papers, a model network has 
been proposed where the edges joining two nodes (a, b) are placed according to whether the 
string a occurs at least once as subsequence in the string b The number of nodes 

with a given number of outgoing and incoming edges determine the "out-degree" and "in- 
degree" distributions, which require the knowledge of the string matching probabilities for 
arbitrary strings of given lengths. A closely related problem in biology is the string alignment 
problem where it is important to determine the probabilities of chance multiple occurrences 
of a random string within a specific target string, allowing mismatches and gaps 0, 0| . 

In the present paper we show that for an alphabet of length 2, i.e., binary strings, the 
matching probabilities of short strings of length n in generic longer strings of length L fall 
into equivalence classes with respect to a property of the short strings which we identify as 
the "shift-match number." We obtain a recursive formula for the computation of the string 
matching probabilities in terms of the "shift-match number" of the shorter string, and the 
length L of the longer strings. Counting the total number occurrences (with multiplicities) 
of strings of length n in strings of length L > n, however, washes out the fine structure 
induced by the shift-match number. 

The paper is organized as follows. Section 2 is devoted to the definition of the shift-match 
number, and proving the dependence of the number of occurrences in longer strings to these 
numbers. In Section 3 this is applied to the computation of the degree distribution. 
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II. THE SHIFT-MATCH NUMBER 



Let a be a given binary string and let P(a, L) be the probability of occurrence of a in 
binary strings of length L. Since the number of distinct binary sequences of length L is 2 L , 
this probability can be expressed as 

P(a, L) = N(a, L) 2~ L , (2.1) 

where N(a, L) is the number of binary sequences of length L that contain a as a subsequence 
at least once. Clearly, if a is a sequence of length n, then N(a, L) = for L < n and 
N(a,n) = 1. Furthermore the probability P(a,L) approaches 1 as L increases. 

We first computed P(a, L) numerically for arbitrary a of a given length n and for in- 
creasing values of L. We then observed that although all sequences a are equally likely to 
occur, for fixed L, the probability P(a, L) shows a variation with respect to a which reveals 
an equivalence class structure. For example, for sequences of length n = 4 we computed 
the following probabilities P(a,L), for a as given in the first line of the Table 1, and for 
L = 4,..., 10. 

Note that for L = 4, the probabilities P(a,4) are all equal to 0.0625 = 2" 4 . For L = 5, 
we see that the sequences fall into two classes, with a = 1111 being distinguished from the 
rest with respect to its probability of inclusion, P(a, 5). For L = 6 and L = 7 we see that 
the equivalence classes branch further, into three, then four. This structure stabilizes after 
L = 7. We also note that 

P(1000,L) = P(1100,L) = P(1110,L) > P(1001,L) = P(1011,L) 

= P(1101,L) > P(1010,L) > P(1111,L) (2.2) 

for each L. This observation was the starting point for our definition of the "shift-match 
number," which explains perfectly the equivalence class structure in the P(a, L)'s. 

The computation of the number of occurrences N(a, L) is motivated by the counting 
algorithm (2.9) displayed in the proof of Proposition 2.1. The final recursion formula (2.10) 
requires a number of intermediate technical definitions such as the shift- match indices jVs 
and ji t mS and the conditional number of occurrences N(a, L, m)'s, to be defined in Section 
2.2. 

In dealing with binary numbers we use the following convention. If a is an n-digit binary 
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number, the i'th digit of a counted from left is denoted by a^, i.e., 

a = a±a 2 . . .a n . (2-3) 

We note that if a denotes the binary number obtained from a by replacing zeros with ones 
and vice versa, i.e., for a = 00100110, a = 11011001, then P(a, L) = P(a, L). Hence without 
loss of generality we may assume that a± — 1 wherever convenient. We now define the shift- 
match number s(a) for a given binary number a and the shift-match equivalence on the set 
of binary numbers of a fixed length. 



A. The shift-match number and shift-match equivalence 

Let a be the n-digit binary number a = a\a 2 a 3 . . . a n . Its shift-match number s(a) is an 
n-digit binary number s(a) = sis 2 s 3 . . . s n where the Sj = Sj(a)'s are defined by 

si(a) = 1 , 

s 2 (a) = S(a 2 ,ai) S(a 3 ,a 2 ) 
s 3 (a) = 5(a 3 ,ai) S(a4,a 2 ) 

s n _i(a) = 5(a n _i,ai) 8(a n ,a 2 ) , 
s n (a) = 6(a n ,a 1 ) . (2.4) 

The shift-match number induces an equivalence relation = by 

a^b if and only if s(a) = s(b) . (2.5) 

In the definition above, the choice s± — 1 is a convention and ensures that s(a) and a have 
same length. For j > 2, sj — 1, if and only if the last n — j + 1 digits of a match with its 
first n — j + 1 digits. We illustrate the computation of s(a) by an example. If a = 1011011, 



. . . 5(a„,a n _i) , 
. . . 5(a n ,a n - 2 ) , 
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then by shifting a to right repeatedly, we obtain 
CL\ a,2 03 04 05 CLq a 7 



1 1 


1 





1 


1 


Si = 


1 


(by convention) 


1 


1 


1 





1 


S2 = 





(no match with the first line) 


1 





1 


1 





S3 = 





(no match with the first line) 




1 





1 


1 


s 4 = 


1 








1 





1 


s 5 = 





(no match with the first line) 








1 





«6 = 





(no match with the first line) 










1 


S7 = 


1 





Hence s(a) = fOOlOOl. We note that in many cases s(s(a)) = s(a), i.e., the shift match 
number of a is frequently an element of the equivalence class of a, but this is not always 
true, as for a = 11011, s(a) = 10011 and s(s(a)) = 10001, where the latter does not belong 
to the equivalence class of a. 

As we will show below that the probability of occurrence of a in longer strings will depend 
only on s(a), we shall suppress the dependency of the shift-match number on a and work 
with s. The cardinality of the equivalence class whose shift-match number is s plays a crucial 
role in the computation of the network connectivities and the corresponding probability will 
be denoted by P(s). We give in Table 2 the list of possible shift-match numbers s for binary 
sequences of length n < 6 together with the corresponding equivalence class S(s) and its 
probability P(s). 

B. The shift-match indices 

Recall that a is a n digit binary string starting with 1, s — s(a) — 1 S2 ... s n is the 
shift-match number of a, and s also is a n digit binary number starting with 1. We shall now 
give the definition of the shift-match indices ji and j ijTn for 1 < i < n and 1 < m < n — 1. 
We set ji = 1, and define ji by 

i 

3i = T- 1 - *k ji+i-k for i > 2 . (2.6) 

k=2 
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The number ji has the following meaning: write a = a\ ... a n in the first line, shift it to 
right once filling with * at the left and continue to get 

a\ a n 

* ai a n -i 

* * 0,1 a n-2 

* ... * ai ... a n - i+ i 

* * CL\ 

If we replace *'s with O's or l's, the ith line represents 2 l ~ 1 distinct strings. If the shifted 
version of a has no match with the fist line, i.e., when s = 10... 0, then all strings are 
distinct. On the other hand when there are matches, i.e., some of the Sj's are non zero, 
there are duplications, and the number of new strings added at each line decreases. The 
indices j«s are exactly the number of distinct new strings added at the line i. 
The "conditional" shift-match indices jj iJn for 1 < m < n — 1 are defined by 

Jl,m Sn—m+1 i 

i 

ji,m — Sn-m+i ~ Sk ji+l~k,m for 2 < % < 171 , 

k=2 

i 

Km = ?- m - 1 -J2 S * 3i+i-k,m for i > m + 1 . (2.7) 

k=2 

The numbers j i>m correspond to distinct sequences added at line i, where first m digits match 
with the last m digits of a. For example for n = 5 we have 

jl = 1 jl,l = Sn jl,2 = S n -i 

J 2 = 2 - S 2 ji J 2 ,l = 1 - S 2 J2,2 = S n ~ S 2 jl, 2 

33 = 2 2 - S 2 32 ~ S 3 ji j 3 ,i = 2 - S 2 j 2 ,l - S 3 ji 5 i j 3i2 = 1 - S 2 32,2 - S 3 jl,2 

These indices will be used in the computation of the string matching probability for all short 
strings whose shift-match number is s. 

C. The number of occurrences 

We define the conditional number of occurrences N(a, L,m) for m = 1, ... n — 1, defined 
to be the number of times a occurs as a subsequence in sequences of length L whose first m 
digits match with the last m digits of a. 
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For example, if a = 110, L — 4, the sequences containing a are 0110, 1100, 1101 and 
1110. Thus iV(a, 4) = 4, iV(a, 4, 1) is the number of such sequences starting with the last 
digit of a , i.e. 0, hence iV(a, 4, 1) = 1. Similarly, JV(a, 4, 2) = 0, since none of these starts 
with 10. 

For simplicity of the presentation we write 

j ifl = ji , iV(a, L, 0) ee iV(a, L) . 

We shall now give the expression of N(a,L) in terms of N(a,L — n) , N(a,L — n,m) for 
m = 1, . . . , n — 1. This result shows, in particular, that P(o, L) depends only on the shift- 
match equivalence class of a. 

Proposition 2.1. Lei a be a binary string of length n, s be its shift-match number and the 
shift-match indices ji and ji iTn be defined by (2.6-7). Then 

N(a,L,m) = , m = 0, ...,n — l for L<n, (2.8a) 
N(a,n,m) = 1 , m = 0, . . . , n - 1, (2.8b) 

L-n+l 

N(a,L,m) = ^ j,, m 2 L -™ +1 - 1 , m = 0, . . . , n - 1 

i=i 

for n + 1 < L < 2n - 1 , (2.8c) 

n 

N(a,L,m) = ^ jj, m 2 L ~ n+1 ~ i + 2 n_m iV(a, L — n, 0) 
i=i 

n 

- ^ji, m iV(a,L-n,i-l) , m = 0,...,ra-l for L > 2n - 1 ,(2.8d) 
i=i 

n 

N(a, L) = 2 L - n+1 -' l + 2 n N{a,L-n) 

i=i 

n 

- J^ji N(a,L-n,i-l) for L > 2n - 1 . (2.8e) 
i=i 

Proof. We can enumerate the sequences of length L containing a by counting the number 
of sequences of length L where the first occurrence of a starts at the kth digit, for k = 
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1, . . . , L — n + 1, as seen below. 

a x a 2 ... a„_i a n * * * 

* ai . . . a„„ 2 fln-l «n * * 

* * . . . a n _ 3 a n _ 2 On-l «n * 



* * ... *K CL\ Q,2 CI3 * 

* * * * ai 02 * 

* * * * CLi * 



(2.9) 



By assigning the values and 1 to the *'s in the first row we obtain 2 L ~ n sequences containing 
a. At the second row, the same procedure gives again 2 L ~ n sequences but duplication of the 
sequences already encountered in the first line should be eliminated. It can be seen that 
sequences obtained from the first and second rows cannot coincide unless the last and the 
first n — 1 digits of a match, i.e., when s 2 — 1. Thus the number of additional sequences is 
exactly j 2 x 2 L ~ n ~ l . By similar considerations it can be seen that at the row i, the number 
of additional sequences is ji x 2 L_n+1 ~\ For the enumeration of the distinct sequences in the 
last [L — 2n+ l)'st rows, we note that if there were no duplications with the ones in the first 
n rows, the contribution from this part would be 2 m x N(a,L — n). Duplications with the 
sequences already enumerated at the j'th row arise whenever the sequence of length L — n 
starts with the last (n — i + l)'st digits of a. But the number of such sequences containing 
a is what we call N(a,L — n,i — 1), which is assumed to be known by the induction step. 
Hence the proof is complete. • 
In Eqs.(2.8a-e), the number of occurrences are labelled by the specific short string a, but 
it is clear from the proof that they depend only on the shift-match number of a. In the next 
section we shall use the notation 

N(s,L) = N(a,L,0) , 
where s = s(a) is the shift-match number of a. 

Corollary 2.2. Let s be a shift-match number for binary string of length n. Then for each 
member of the equivalence class £(s), the number of occurrences in strings of length L is 



given by 

N(s,n) = 1 , (2.10a) 

L—n+l 

N{s, L) = Ji 2 L -' n+1 - i for n + l<L<2n — l, (2.10b) 

i=l 
n 

N{s,L) = J2 Jl 2 L - n+1 - l + 2 n N{s,L-n) 

2=1 
II 

- ^ji N{s,L-n,i-l) for L > 2n - 1 . (2.10c) 

The probability of occurrence of any element of the corresponding equivalence class is 
P(s,L) = N(s,L)/2 L . The P(s,L) for strings of length n = 2, . . . 6 in strings of length 
L < 12 have been computed using the formulas (2.10). These numbers have been checked 
by direct enumeration of the numbers of matches of the given strings in all strings of length 
L using Octave. The results obtained for the probability of occurrence of a given string with 
shift-match number s in a string of length L are displayed in Figure 1. 

We display the numerical values of the N(s, L) in Table 3, where s denotes the shift 
match number of any representative of a given equivalence class and P{s) is the probability 
of this equivalence class. The L values appear at the top of the respective columns. An 
inspection of Table 3 shows that for each pair of lengths n, L, proceeding up the column of 
shift-match numbers s, there exists a special value s c of s, such that N(s, L) = N(s c , L) for 
all s < s c . Moreover, the total probability J2 s >s c P ( s ) °f finding strings of length n with 
shift match numbers larger than s c decreases rapidly with n, as can be seen in Table 3. 
Thus the expected value (N(s, L)) = P(s)N(s, L) — > N(s c , L) from below, as n becomes 
large, as shown in Table 4. 

We will now use these findings to discuss the probability of occurrence of a given string 
of length n in a string of length L, which is 

p(n,L) = J2 P ^ P ( S > L ) = < P ( S > L )> • ( 2 - n ) 

In a previous paper by Mungan et al. , an approximate expression has been given for the 
probability p(n, L), namely, 

p M (n,L)~l-(l-2 n ) L - n+1 . (2.12) 

The frequencies (N(s, L)) reported in Table 4 have been obtained using Eqs.(2.10) (first line 
for each n value) and from 2 L pM{n, L) (second line for each n value). A comparison of the 
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numbers shows that they are extremely close, which is gratifying since the approximation 
(2.12) has been derived by a completely independent route. 

Furthermore, note that for n and L — n large, the expression in Eq.(2.12) is approximately 



p M (n, L)~(L-n + l)/2" = Pe (n, L) . 



(2.13) 



This corresponds to the limit where the corrections to the probability, coming from cor- 
relations between successive shifted sequences, can be neglected. In fact, we find that 
p e (n,L) = N(s c , L)/2 L exactly, or N(s c ,L) = 1 L ~ n (L — n + 1). It is once more instructive 
to note that in all cases N(s c , L) is the frequency associated with the smallest shift-match 
numbers. 

Had we asked for the total number of multiple occurrences of a given string of length 
n in a string of length L > n, we would have found that the sum of these numbers over 
all sequences of length L depends only on L — n, and does not depend on the shift-match 
number of the short sequence. This total value happens to be equal to the limiting values 
given m the last line of Table 4, namely N(s c , L) = 2 L ~ n (L -n + 1). 

III. APPLICATIONS 

Assume that we are given a collection C of strings of lengths L < L max and let N L denote 
the number of strings of length L. If a is a binary string in this collection, the degree of a, 
d(a), is the number of sequences in C that contain a as a subsequence. The considerations 
in Section 2 allow us to compute the degree distribution in C, over its elements a. 

Define d(s, L) as the expected number of occurrence of a string with shift-match number 
s, in Nl strings of length L, 



If the shift-match number of a is s, then the expected value of d(a), depends only on s, and 
we denote it by d(s). It can be seen that 



d(s,L) = P(s,L) N L . 



(3.1) 



L=n 




(3.2) 



L=n 
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A brief inspection of Table 3 shows that the degree of a node decreases with the string length 
n and for fixed n, with the shift-match number. Thus for fixed L max we have the ordering 

d(10) > d(ll) > d(100) > d(101) > d(lll) 

> d(1000) > d(1001) > d(1010) > d(llll) > . . . . 

As N(s, L)'s decrease with s for fixed L, the probabilities P(s, L) also decrease with increas- 



ing s. Thus, for any set of numbers Nl, the expected values d(s) form a decreasing sequence. 
For lower values of s these numbers are strictly decreasing, but if the number N(s, L) and 



N(s', L) coincide in the range of n < L < L m3iX then d(s) and d(s') will be the same, 
hence the equivalence classes s and s' will be indistinguishable with respect to their string- 
matching probabilities. We can view the computation of the string matching probabilities 
on a collection of binary strings as a splitting effect revealing a spectral structure. 

In the numerical simulation of Ref. |3j, faithfully reproduced by the analytical approach of 
Refs. 4], the average degree was computed for strings of length n, and the degree distribution 
exhibited peaks centered at these average values, with a certain variation about this value. 
We now see that the degree distribution for strings of length n in fact splits into discrete 
spectral lines, identified with different shift-match numbers. Thus, in the ideal case, where 



we can replace d(a) by d(s) : the spectrum is discrete. 

The strength of the spectral lines correspond to the number of strings u(d(s)) that have 
degree d(s). If s is the shift-match number for the string of length n, then u(d(s)) is given 
by 



u(d(s)) = P(s) N n , (3.3) 

where P(s) is the probability of the equivalence class as given in Table 1 and N n is the total 
number of strings of length n. As an example, for s = 100 and L max = 6 by using Table 3, 
we easily get 

1 4 12 31 

d(100) = -N 3 + —N 4 + —N 5 + —N 6 , 
v ) 8 3 16 32 6 64 

K^Too)) = ^n 3 . 



The results obtained for u(d(s)) with respect to d(s) have been shown in Figure 2, where we 
have chosen N n = L p 2 (l — p) n , for p = 0.05 and L = 15000, as in Refs. [3, 4]. Figure 2 may 
be compared to Figure 3 of Ref. where v(d), averaged over 500 random realisations of 
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C, has been plotted. Note that unless L max is large enough, the splitting between the values 
of d(s) for successive s, does not show up, as can be seen from the Table 3. Moreover, the 
u(d(s)) values are degenerate for different values of s, for small n. 

The total number q(n, L) of connections leading from strings of length n to strings of 
length L > n, averaged over 1000 random realisations, is also displayed in Table 5. These 
numbers q(n, L) should coincide with a weighted averages of <i(s)'s over shift-match numbers 
s's. 

q(n, L) = J2 P ( s ) N n N{s, L) 2~ L N L , (3.4) 

\s\=n 

where |s| = n denotes the sequences having shift-match number s of length n. The numbers 
computed from (3.4) agree with the numbers given in Table 5, given that the numbers N n 
and N L are taken to be the number of sequences of length n and L respectively, averaged 
over the ensemble of realisations. 
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TABLE I: The probabilities of occurrences of short strings a of length n = 4 starting with 1, in 
generic strings of length L = 4, . . . , 10. 



a 


1000 


1001 


1010 


1011 


1100 


1101 


1110 


1111 




0.0625 


0.0625 


0.0625 


0.0625 


0.0625 


0.0625 


0.0625 


0.0625 


P(a,5) 


0.1250 


0.1250 


0.1250 


0.1250 


0.1250 


0.1250 


0.1250 


0.0938 


P(a,6) 


0.1875 


0.1875 


0.1719 


0.1875 


0.1875 


0.1875 


0.1875 


0.1250 


P(a,7) 


0.2500 


0.2422 


0.2188 


0.2422 


0.2500 


0.2422 


0.2500 


0.1563 


P(a,8) 


0.3086 


0.2930 


0.2656 


0.2930 


0.3086 


0.2930 


0.3086 


0.1875 


P(a,9) 


0.3633 


0.3398 


0.3086 


0.3398 


0.3633 


0.3398 


0.3633 


0.2168 


P(a,10) 


0.4141 


0.3838 


0.3486 


0.3838 


0.4141 


0.3838 


0.4141 


0.2451 




FIG. 1: The probability, P(s,L), of a given string with shift-match number s to be reproduced in 
a randomly chosen string of length L, plotted as a function of L. The dotted line corresponds to 
the expectation value of P(s,L) averaged over the shift-match numbers s. It should be remarked 
that for n = 3, the branches for s = 100 and s = 101 are degenerate up to L = 4, where they split. 
For n = 4, a first splitting occurs at L = 5 and a further ones at L = 6 and 7. See Table 3 for 
the degeneracies in the number of occurrences N(s,L), that give rise to this progressive splitting. 
(Color online) 
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TABLE II: List of the shift-match equivalence classes of binary strings of length n < 6. The 
shift-match number is given in the first column. Following the convention a± = 1, only half of 
the elements in each equivalence class are displayed and they are written in decimal form for 
compactness. The probability P(s) is obtained as the ratio of the cardinality of 5(s) to 2™/2. 



s 


P(s) 


£(s) 


10 


1/2 


{2} 


11 


1/2 


{3} 


100 


2/4 


{4,6} 


101 


1/4 


{5} 


111 


1/4 


{7} 


1000 


3/8 


{8,12,14} 

L * " J 


1001 


3/8 


{9,11,13} 

L ' ' J 


1010 


1/8 


1101 


1111 


1/8 


{15} 


10000 


6/16 


{16,20,24,26,28,30} 


10001 


5/16 


{17,19,23,25,29} 


10010 


2/16 


{18,22} 


10101 


1/16 


{21} 


10011 


1/16 


{27} 

L J 


11111 


1/16 


{31} 

I J 


100000 


10/32 


{32,40,44,48,50,52,56,58,60,62} 


100001 


11/32 


{33,35,37,39,41,43,47,49,53,57,61} 


100010 


3/32 


{34,38,46} 


100011 


3/32 


{51,55,59} 


100100 


2/32 


{36,54} 


100101 


1/32 


{45} 


101010 


1/32 


{42} 


mill 


1/32 


{63} 
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TABLE III: In the table above the first column labelled by s is the shift-match number of the 
members of the equivalence class while the second column labelled by P(s) is the probability of 
occurrence of the corresponding equivalence class in strings of length n = 2, . . . , 6. The numbers 
N(s, L), of occurrences of any string with shift-match number s in sequences of length L, are given 
in columns labelled by L = 2, . . . , 12. Clearly nonzero occurrences start after L = n. The numbers 
of occurrences decrease as the shift match numbers increase. The probabilities of occurrences, 
P(s,L) , s, are found by dividing the numbers in the column labelled by L with 2 L . 



s 
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TABLE IV: In this table we present the values of (N(s, L)) = (P(s, L))2 L , obtained in two different 
ways. For each n, the first line corresponds to Eq.(2.10), and the second line to the value obtained 
fromp M (n,L)2 L (Eq.(2.12)). See text. 
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TABLE V: In this table the average simulation results are given for 1000 different realisations 
coming from Balcan-Erzan model [3j, assuming a length distribution N n = Lq p 2 (1 — p) n for 
p = 0.05 and Lq = 15000. The numbers given above show the average number of out going bonds 
from strings of length n (where n is indicated in the first column) to strings of length L (first row) . 
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FIG. 2: The degree distribution v(d(s)) of a given string of length n with shift-match number 
s versus the degree d(s) as computed from (3.3) and (3.2) respectively. In the computation of 
i>(d(s))'s and d(s) , s the number of sequences of length n, N n , is taken to be Lq p 2 (1 — p) n for 
p = 0.05 and Lq = 15000, where the parameters have been chosen to facilitate comparison with 
the numerical simulation of Ref . 131 . 
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